Towards Accelerating Data Intensive Application's Shuffle Process Using SmartNICs

نویسندگان

چکیده

The wide adoption of the emerging SmartNIC technology creates new opportunities to offload application-level computation into networking layer, which frees burden host CPUs, leading performance improvement. Shuffle, all-to-all data exchange process, is a critical building block for network communication in distributed data-intensive applications and can potentially benefit from SmartNICs. In this paper, we develop SmartShuffle, accelerates application's shuffle process by offloading various tasks devices. SmartShuffle supports both low-level functions, including partitioning transport, high-level tasks, filtering, aggregation, sorting. adopts coordinated architecture make sender-side receiver-side SmartNICs jointly contribute benefits offload. carefully manages tight time-varying memory constraints on device. We propose liquid approach, dynamically migrates operators between CPU at runtime such that resources devices are fully utilized. prototype Stingray SoC plug it Spark. Our evaluation shows improves efficiency I/O with lower job completion time. outperforms Spark, Spark RDMA up 40% TPC-H.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Accelerating Data Intensive Applications using MapReduce

Information explosion propelled by the exponential growth in digitised data is an unstoppable reality. To be able to extract relevant and useful knowledge from this voluminous data in order to make well-informed decision is a competitive advantage in the information age. However, the attempts to transform raw data into valuable knowledge face both data and computational intensive challenges. As...

متن کامل

Testing Data Consistency of Data-Intensive Applications Using QuickCheck

Many software systems are data-intensive and use a data management systems for data storage, such as Relational Database Management Systems (RDBMS). RDBMSs are used to store information in a structured manner, and to define several types of constraints on the data, to maintain basic consistency. The RDBMSs are mature, well tested, software products that one can trust to reliably store data and ...

متن کامل

Designing Data-Intensive Applications

This Preview Edition of Designing Data-Intensive Applications, Chapters 1 and 2, is a work in progress. The final book is currently scheduled for release in July 2015 and will be available at oreilly.com and other retailers once it is published. O'Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://safarib...

متن کامل

A parallel arithmetic array for accelerating compute-intensive applications

A parallel arithmetic array processor for accelerating compute-intensive applications in low-power embedded systems is proposed in this study. The proposed flexible hardware architecture enables the fast execution of both control-dominated and compute-centric streaming computation tasks on the same array. Consequently, multiple levels of parallelism can be efficiently exploited. A test chip int...

متن کامل

FastBit: An Efficient Indexing Technology For Accelerating Data-Intensive Science

FastBit is a software tool for searching large read-only datasets. It organizes user data in a column-oriented structure which is efficient for on-line analytical processing (OLAP), and utilizes compressed bitmap indices to further speed up query processing. Analyses have proven the compressed bitmap index used in FastBit to be theoretically optimal for onedimensional queries. Compared with oth...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ACM on measurement and analysis of computing systems

سال: 2023

ISSN: ['2476-1249']

DOI: https://doi.org/10.1145/3589980